{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Malay"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "\n",
    "This tutorial is available as an IPython notebook at [Malaya/example/dictionary-malay](https://github.com/huseinzol05/Malaya/tree/master/example/dictionary-malay).\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### requirements"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make sure you already installed,\n",
    "\n",
    "```bash\n",
    "pip3 install requests beautifulsoup4\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ['CUDA_VISIBLE_DEVICES'] = ''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/ubuntu/dev/malaya/malaya/tokenizer.py:202: FutureWarning: Possible nested set at position 3361\n",
      "  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))\n",
      "/home/ubuntu/dev/malaya/malaya/tokenizer.py:202: FutureWarning: Possible nested set at position 3879\n",
      "  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))\n"
     ]
    }
   ],
   "source": [
    "import malaya"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### DBP\n",
    "\n",
    "Query from https://prpm.dbp.gov.my/cari1?keyword=,\n",
    "\n",
    "```python\n",
    "def keyword_dbp(word, parse: bool = False):\n",
    "    \"\"\"\n",
    "    crawl https://prpm.dbp.gov.my/cari1?keyword= to check a word is a malay word.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    word: str\n",
    "    parse: bool, optional (default=False)\n",
    "        if True, will parse using BeautifulSoup.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    result: Dict\n",
    "    \"\"\"\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "malaya.dictionary.keyword_dbp('ayam')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "malaya.dictionary.keyword_dbp('ayamaaaaa')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'definisi': ['Definisi : sj ikan; ~ hutan a) Euxiphippops sextriatus; b) Pomacanthus annularis; ~ laut, Abalistes spp.\\xa0(Kamus Dewan Edisi Keempat)',\n",
       "  'Definisi : beberapa jenis binatang (yg bentuk tubuhnya seakan-akan burung tetapi tidak pandai terbang) yg biasanya dipelihara, Gallus gallus. ~ belanda sj ayam yg besar, Meleagris gallopavo. ~ beroga (denak, hutan) sj ayam liar, Gallus bankiva. ~ biring ayam jantan yg kuning kakinya. ~ bulu balik ayam yg bulunya terbalik. ~ dara ayam betina yg hampir bertelur. ~ katik ayam yg kecil. ~ percik ayam panggang yg disaluti sos atau kuah yg dibuat drpd santan dan rempah-ratus. ~ sabung ayam yg dipelihara utk disabung. ~ serama sj ayam peliharaan yg kecil, jinak, berbulu cantik dan berkaki pendek. ~ tambatan ki orang yg dianggap hebat dan diharapkan dpt membawa kemenangan dlm sesuatu perlawanan, mis bola sepak dan bola jaring.\\xa0(Kamus Pelajar Edisi Kedua)'],\n",
       " 'tesaurus': None}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "malaya.dictionary.keyword_dbp('ayam', parse = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "malaya.dictionary.keyword_dbp('ayamaaaaa', parse = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Wiktionary\n",
    "\n",
    "Query from https://en.wiktionary.org/wiki/,\n",
    "\n",
    "```python\n",
    "def keyword_wiktionary(\n",
    "    word,\n",
    "    acceptable_lang: List[str] = ['brunei malay', 'malay'],\n",
    "):\n",
    "    \"\"\"\n",
    "    crawl https://en.wiktionary.org/wiki/ to check a word is a malay word.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    word: str\n",
    "    acceptable_lang: List[str], optional (default=['brunei malay', 'malay'])\n",
    "        acceptable languages in wiktionary section.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    result: Dict\n",
    "    \"\"\"\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'brunei malay': [{'etymology': 'From Proto-Malayic *hayam, from Proto-Malayo-Polynesian *qayam.\\n',\n",
       "   'definitions': [{'partOfSpeech': 'noun',\n",
       "     'text': ['ayam', 'chicken (bird)', 'chicken (meat)'],\n",
       "     'relatedWords': [],\n",
       "     'examples': []}],\n",
       "   'pronunciations': {'text': ['IPA: /ajam/',\n",
       "     '(Kedayan) IPA: /hajam/',\n",
       "     'Hyphenation: a‧yam'],\n",
       "    'audio': []}}],\n",
       " 'malay': [{'etymology': 'From hayam, from Proto-Malayic *hayam, from Proto-Malayo-Polynesian *qayam.\\n',\n",
       "   'definitions': [{'partOfSpeech': 'noun',\n",
       "     'text': ['ayam (Jawi spelling ايم\\u200e, plural ayam-ayam, informal 1st possessive ayamku, 2nd possessive ayammu, 3rd possessive ayamnya)',\n",
       "      'chicken (bird)',\n",
       "      'chicken (meat)'],\n",
       "     'relatedWords': [{'relationshipType': 'synonyms',\n",
       "       'words': ['manuk / مانوق\\u200e']}],\n",
       "     'examples': []}],\n",
       "   'pronunciations': {'text': ['IPA: /ajam/', 'Rhymes: -ajam, -jam, -am'],\n",
       "    'audio': []}}]}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "malaya.dictionary.keyword_wiktionary('ayam')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'brunei malay': [{'etymology': '',\n",
       "   'definitions': [],\n",
       "   'pronunciations': {'text': [], 'audio': []}}],\n",
       " 'malay': [{'etymology': '',\n",
       "   'definitions': [],\n",
       "   'pronunciations': {'text': [], 'audio': []}}]}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "malaya.dictionary.keyword_wiktionary('ayamaaaa')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Check a word is a malay word\n",
    "\n",
    "```python\n",
    "def is_malay(word, stemmer=None):\n",
    "    \"\"\"\n",
    "    Check a word is a malay word.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    word: str\n",
    "    stemmer: Callable, optional (default=None)\n",
    "        a Callable object, must have `stem_word` method.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    result: bool\n",
    "    \"\"\"\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "malaya.dictionary.is_malay('ayam')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "malaya.dictionary.is_malay('sakitkan')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "malaya.dictionary.is_malay('tersakitkan')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "stemmer = malaya.stem.sastrawi()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "malaya.dictionary.is_malay('tersakitkan', stemmer = stemmer)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
